52 research outputs found

    On the Non-Gaussianity of Sea Surface Elevations

    Get PDF
    The sea surface elevations are generally stated as non-Gaussian processes in the current literature, being considered Gaussian for short periods of relatively low wave heights. The objective here is to study the evolution of the distribution of the sea surface elevation from Gaussian to non-Gaussian as the period of time in which the associated time series is recorded increases. To do this, an empirical study based on the measurements of the buoys in the US coast downloaded at a casual day is performed. This study results in rejecting the null hypothesis of Gaussianity in below 25% of the cases for short periods of time and in over 95% of the cases for long periods of time. The analysis pursued relates to a recent one by the author in which the heights of sea waves are proved to be non-Gaussian. It is similar in that the Gaussianity of the process is studied as a whole and not just of its one-dimensional marginal, as it is common in the literature. It differs, however, in that the analysis of the sea surface elevations is harder from a statistical point of view, as the one-dimensional marginals can be Gaussian, which is observed throughout the study and in that a longitudinal study is performed here.A.N.-R. was supported by grant MTM2017-86061-C2-2-P funded by MCIN/AEI/10.13039/501100011033 and “ERDF A way of making Europe”A.N.-R. was supported by grant MTM2017-86061-C2-2-P funded by MCIN/AEI/10.13039/ 501100011033 and “ERDF A way of making Europe”

    A Topologically Valid Definition of Depth for Functional Data

    Get PDF
    The main focus of this work is on providing a formal definition of statistical depth for functional data on the basis of six properties, recognising topological features such as continuity, smoothness and contiguity. Amongst our depth defining properties is one that addresses the delicate challenge of inherent partial observability of functional data, with fulfillment giving rise to a minimal guarantee on the performance of the empirical depth beyond the idealised and practically infeasible case of full observability. As an incidental product, functional depths satisfying our definition achieve a robustness that is commonly ascribed to depth, despite the absence of a formal guarantee in the multivariate definition of depth. We demonstrate the fulfillment or otherwise of our properties for six widely used functional depth proposals, thereby providing a systematic basis for selection of a depth function

    Statistical Depth for Text Data: An Application to the Classification of Healthcare Data

    Get PDF
    This manuscript introduces a new concept of statistical depth function: the compositional D-depth. It is the first data depth developed exclusively for text data, in particular, for those data vectorized according to a frequency-based criterion, such as the tf-idf (term frequency?inverse document frequency) statistic, which results in most vector entries taking a value of zero. The proposed data depth consists of considering the inverse discrete Fourier transform of the vectorized text fragments and then applying a statistical depth for functional data, D. This depth is intended to address the problem of sparsity of numerical features resulting from the transformation of qualitative text data into quantitative data, which is a common procedure in most natural language processing frameworks. Indeed, this sparsity hinders the use of traditional statistical depths and machine learning techniques for classification purposes. In order to demonstrate the potential value of this new proposal, it is applied to a real-world case study which involves mapping Consolidated Framework for Implementation and Research (CFIR) constructs to qualitative healthcare data. It is shown that the DDG -classifier yields competitive results and outperforms all studied traditional machine learning techniques (logistic regression with LASSO regularization, artificial neural networks, decision trees, and support vector machines) when used in combination with the newly defined compositional D-depth.Funding: A.N.-R. is supported by Grant 21.VP67.64662 funded by “Proyectos Puente 2022” from the Spanish Government of Cantabria. For H.L.R., the qualitative data used in study were funded by Instituto de Salud Carlos III through the project “PI17/02070” (co-funded by the European Regional Development Fund/European Social Fund “A way to make Europe”/“Investing in your future”) and the Basque Government Department of Health project “2017111086”. The funding bodies had no role in the design of the study, collection, analysis, interpretation of data nor the writing of the manuscript

    Supervised Classification of Healthcare Text Data Based on Context-Defined Categories

    Get PDF
    Achieving a good success rate in supervised classification analysis of a text dataset, where the relationship between the text and its label can be extracted from the context, but not from isolated words in the text, is still an important challenge facing the fields of statistics and machine learning. For this purpose, we present a novel mathematical framework. We then conduct a comparative study between established classification methods for the case where the relationship between the text and the corresponding label is clearly depicted by specific words in the text. In particular, we use logistic LASSO, artificial neural networks, support vector machines, and decision-tree-like procedures. This methodology is applied to a real case study involving mapping Consolidated Framework for Implementation and Research (CFIR) constructs to health-related text data and achieves a prediction success rate of over 80% when just the first 55% of the text, or more, is used for training and the remaining for testing. The results indicate that the methodology can be useful to accelerate the CFIR coding process.A.N.-R. is supported by Grant MTM2017-86061-C2-2-P funded by “ERDF A way of making Europe” and MCIN/AEI/10.13039/501100011033. For H.L.R., this study was funded by Instituto de Salud Carlos III through the project “PI17/02070” (co-funded by the European Regional Development Fund/European Social Fund “A way to make Europe”/“Investing in your future”) and the Basque Government Department of Health project “2017111086”. The funding bodies had no role in the design of the study, collection, analysis, nor interpretation of data, nor the writing of the manuscript. The APC was paid by PI17/0207

    Properties of Statistical Depth with Respect to Compact Convex Random Sets: The Tukey Depth

    Get PDF
    We study a statistical data depth with respect to compact convex random sets, which is consistent with the multivariate Tukey depth and the Tukey depth for fuzzy sets. In addition, it provides a different perspective to the existing halfspace depth with respect to compact convex random sets. In studying this depth function, we provide a series of properties for the statistical data depth with respect to compact convex random sets. These properties are an adaptation of properties that constitute the axiomatic notions of multivariate, functional, and fuzzy depth-functions and other well-known properties of depth.For L.G.-D.L.F. and A.N.-R., this research was supported by grant MTM2017-86061-C2-2-P funded by MCIN/AEI/10.13039/501100011033 and “ERDF A way of making Europe”. P.T. was supported by the Ministerio de Economía y Competitividad grant MTM2015-63971-P, the Ministerio de Ciencia, Innovación y Universidades grant PID2019-104486GB-I00, and the Consejería de Empleo, Industria y Turismo del Principado de Asturias grant GRUPIN-IDI2018-000132

    Statistical depth for fuzzy sets

    Get PDF
    Statistical depth functions provide a way to order the elements of a space by their centrality in a probability distribution. That has been very successful for generalizing non-parametric order-based statistical procedures from univariate to multivariate and (more recently) to functional spaces. We introduce two general definitions of statistical depth which are adapted to fuzzy data. For that purpose, two concepts of symmetric fuzzy random variables are introduced and studied. Furthermore, a generalization of Tukey's halfspace depth to the fuzzy setting is presented and proved to satisfy the above notions, through a detailed study of its properties.A. Nieto-Reyes and L. Gonzalez are supported by the Spanish Ministerio de Economía, Industria y Competitividad grant MTM2017-86061-C2-2-P. P. Terán is supported by the Ministerio de Economía y Competitividad grant MTM2015-63971-P, the Ministerio de Ciencia, Innovación y Universidades grant PID2019-104486GB-I00 and the Consejería de Empleo, Industria y Turismo del Principado de Asturias grant GRUPIN-IDI2018-000132

    Classification of Alzheimer's patients through ubiquitous computing

    Get PDF
    Functional data analysis and artificial neural networks are the building blocks of the proposed methodology that distinguishes the movement patterns among c?s patients on different stages of the disease and classifies new patients to their appropriate stage of the disease. The movement patterns are obtained by the accelerometer device of android smartphones that the patients carry while moving freely. The proposed methodology is relevant in that it is flexible on the type of data to which it is applied. To exemplify that, it is analyzed a novel real three-dimensional functional dataset where each datum is observed in a different time domain. Not only is it observed on a difference frequency but also the domain of each datum has different length. The obtained classification success rate of 83% indicates the potential of the proposed methodologyThis work was partially supported by project PAC::LFO of the Spanish Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia under grant MTM2014-55262-P, and by the Spanish Ministerio de Economía y Competitividad under grant MTM2014-56235-C2-2-P. We gratefully acknowledge the “Asociación de Familiares de Enfermos de Alzheimer en Cantabria” and Pablo Cobo García for their participation in the various studies
    • …
    corecore